Information Retrieval from Dutch Historical Corpora

نویسنده

  • Loes Braun
چکیده

Preface Writing a thesis is often regarded as rather solitary labour. During my research I learned that the opposite is true, since writing this thesis would not have been possible without the support of several people who I would like to acknowledge in this preface. First and foremost I would like to express my gratitude to the members of my thesis committee. I would like to thank Prof. dr. H.J. van den Herik, the chair of my thesis committee, both for his valuable comments on my thesis and especially for stimulating my interest in scientific research. I acknowledge Prof. dr. A.M.J.A. Berkvens for arranging contacts with the domain experts, for commenting on my thesis, and for his interest in my research. I am sincerely grateful to Dr. F.J. Wiesman, my daily advisor, for his guidance and enthusiastic support during the writing of my thesis. Without his valuable comments my thesis would not have been written in its present form. I would like to thank Dr. I.G. Sprinkhuizen-Kuyper for answering many of my questions, especially on mathematics and statistics, and for commenting on my thesis. Moreover, I am greatly indebted to the domain experts, Prof. dr. who participated in the various experiments which were part of my research. Without their cooperation, the research could not have been conducted as it was. I am grateful to my parents for their continuous support and interest in my education, especially during the writing of this thesis. They were the first to teach me motivation, self-discipline and aiming at perfection; all these characteristics I tried to pursue when writing this thesis. Finally, I would like to give special thanks to Ron for his support and encouragement. Although our interests lie in different scientific disciplines, we had many interesting discussions which greatly inspired me during my research. i ii PREFACE Summary The thesis focuses on a separate area in the field of information retrieval: information retrieval from historical corpora. The goal of the research is to identify the bottlenecks of information retrieval from historical corpora and to solve these bottlenecks. A collection of Dutch and Belgian law texts from the 16th and 17th century serves as test corpus. The research consists of three phases: bottleneck identification, solution determination and system design, and system evaluation and comparison. In the first phase (bottleneck identification), an experiment is conducted to identify the bottlenecks of information retrieval from historical corpora. …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deriving a Bilingual Lexicon for Cross Language Information Retrieval

In this paper we describe a systematic approach to derive a bilingual lexicon automatically from paral lel corpora Following this approach a lexicon was derived from the English and Dutch version of the Agenda corpus With the lexicon and a part of the corpus that was not used to derive the lexicon a bilingual retrieval environment was build Recall and precision of monolingual Dutch retrieval wa...

متن کامل

Information Retrieval from Historical Corpora

With the increasing number of documents that are available in digital form, also the number of digital historical documents is increasing (Berkvens, 2001). It cannot be assumed that standard IR systems perform well on historical documents: historical texts differ from modern texts in three ways (Hüning, 1996; Van Der Horst and Marschall, 1989): (a) vocabularies have changed, (b) spelling has ch...

متن کامل

Historical Event Extraction from Text

In this paper, we report on how historical events are extracted from text within the Semantics of History research project. The project aims at the creation of resources for a historical information retrieval system that can handle the time-based dynamics and varying perspectives of Dutch historical archives. The historical event extraction module will be used for museum collections, allowing u...

متن کامل

Report on the 3rd Dutch-Belgian Information Retrieval Workshop (DIR-2002)

In the Low Countries, interest in information retrieval, the discipline that is mainly concerned with identifying information in document or multimedia collections, has been modest but steady throughout the years. In 2000, this led to the first Dutch-Belgian Information Retrieval Workshop (DIR) at the University of Maastricht (the Netherlands). Two years later, the third edition of DIR shows th...

متن کامل

Cross-Language Information Retrieval with Latent Topic Models Trained on a Comparable Corpus

In this paper we study cross-language information retrieval using a bilingual topic model trained on comparable corpora such as Wikipedia articles. The bilingual Latent Dirichlet Allocation model (BiLDA) creates an interlingual representation, which can be used as a translation resource in many different multilingual settings as comparable corpora are available for many language pairs. The prob...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002